98 ◾ Bioinformatics
After decompression, the program files will be in “SPAdes-3.15.4-Linux/bin” directory.
You will need to add the program path to “.bashrc” file to be able to use the SPAdes program
files from any directory.
export PATH=”/home/your_path/SPAdes-3.15.4-Linux/bin:$PATH”
Replace “your_path” with the correct path to the directory on your computer. After adding
the line, restart the terminal or use “source ~/.bashrc” for the change to take effect. You can
test the installation by running the following command on the Linux terminal command
line:
spades.py --test
The above command will assemble toy read data that comes with the program. The output
will be saved in “spades_test”, which are the typical SPAdes output files.
Remember that we have downloaded two FASTQ files for E. coli and we used them with
ABySS above. The two files were saved in “denovo/fastq”; run the following command on
the Linux command line while you are in the working directory “denovo”:
python spades.py \
--pe1-1 fastq/ERR1007381_1.fastq.gz \
--pe1-2 fastq/ERR1007381_2.fastq.gz \
--isolate \
-o spades_ecoli_ass
The above code assembles contigs and scaffolds from the paired-end reads in the input
FASTQ files and saves the output files in a new directory “spades_ecoli_ass”. The output
files include intermediate file used for the construction of contigs and scaffolds, log files,
contigs’ file, and scaffolds’ file in FASTA format. The latter represents the assembly.
SPAdes can perform hybrid de novo assembly using any of PacBio continuous long reads
(CLR) or Oxford Nanopore reads as input with Illumina or Ion Torrent reads. You will
need to use “--pacbio” option for the PacBio FASTQ file and “--nanopore” option for the
Oxford Nanopore FASTQ file. In the following example, first, we download four PacBio
SMRT FASTQ files “SRR801646”, “SRR801649”, “SRR801652”, and “SRR801638” for E. coli
K12. Second, we will use these four PacBio files with the Illumina short read files to assem-
ble E. coli genome with SPAdes program.
python spades.py \
--pe1-1 fastq/ERR1007381_1.fastq.gz \
--pe1-2 fastq/ERR1007381_2.fastq.gz \
--pacbio pacbio/SRR801646.fastq.gz \
--pacbio pacbio/SRR801649.fastq.gz \
--pacbio pacbio/SRR801652.fastq.gz \
--pacbio pacbio/SRR801638.fastq.gz \